Classification of Words Based on Affix Evidence
نویسندگان
چکیده
Category of a word, as may be indicated in a lexicon, is a useful piece of information for most linguistic exercises. Hence determination of categories of words in a natural language is an important task. There has been much work in Part-of-Speech tagging of words in texts, where the tagset is usually pre-determined and provided to the system. In this work we attempt to determine the categories of the root words of a language by considering the usage of suffixes in a corpus of a (highly) inflectional language. This information can be used in a lexicon. Instead of using a pre-defined list of word categories, we first identify the underlying word categories by using a set-theoretic representation of the information on suffix usage available from the input corpus.
منابع مشابه
Affix Productivity and Base Productivity
Morphological productivity is generally seen as affix-driven, with a given affix selecting base words satisfying a range of formal and semantic selectional restrictions. Plag (1999), however, has shown that there are base-driven selectional restrictions, and Hay (2000) and Hay & Baayen (2002) have shown that the frequency relation between derived and base word is correlated with affix productiv...
متن کاملMorpheme Segmentation from Distributional Information
Morphology is the study of how meaningful components of form are combined to make complex words. Understanding how such complex words can be ‘broken apart’ into their morphological constituents is the problem of morpheme segmentation. While words that have similar meanings tend to share similar forms (e.g., run and running), many morphemes do not have transparently shared meanings. For example,...
متن کاملEffect of Productivity on Export: New Evidence from Iranâs Manufacturing Industries
Based on the recent literature of heterogeneous firms, productive firms self select themselves into foreign markets. In this framework, there is a productivity rise prior to exporting. On the other words, different export performance across firms is linked to their heterogeneity. Â The main purpose of the present paper is to examine the so-called hypothesis of heterogeneous firm in Iran. For ...
متن کاملAn Affix Stripping Morphological Analyzer for Turkish
This paper presents the design and the implementation of a morphological analyzer for Turkish. A new methodology is proposed for doing the analysis of Turkish words with an affix stripping approach and without using any lexicon. The rule-based and agglutinative structure of the language allows Turkish to be modeled with finite state machines (FSMs). In contrast to the previous works, in this st...
متن کاملLexical frequency and acoustic reduction in spoken Dutch.
This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with differe...
متن کاملThe neural bases of the learning and generalization of morphological inflection.
Affixal inflectional morphology has been intensively examined as a model of productive aspects of language. Nevertheless, little is known about the neurocognition of the learning and generalization of affixal inflection, or the influence of certain factors that may affect these processes. In an event-related fMRI study, we examined the neurocognition of the learning and generalization of plural...
متن کامل